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Abstract 


2. 1. 1.  HMM Architecture 


We  propose  a  sophisticated  tying  mechanism  for  modeling  dele¬ 
tion  transformations  between  dialects.  We  empirically  show 
that  the  proposed  tying  mechanism  reduces  deletion  errors  by 
33%  when  compared  to  a  baseline  system  using  a  standard  ty¬ 
ing  mechanism.  Statistical  tests  show  that  the  proposed  and 
baseline  models  make  statistically  different  errors,  thus  sug¬ 
gesting  that  they  are  complementary  systems  in  dialect  recogni¬ 
tion  tasks.  Pronunciation  rules  learned  by  our  proposed  system 
quantify  the  occurrence  frequency  of  known  rales,  and  suggest 
rale  candidates  for  further  linguistic  studies. 

Index  Terms:  pronunciation  model 


States.  Suppose  the  reference  phone  sequence  is  C  = 
ci,C2, ...,  c„.  Each  reference  phone  c.  corresponds  to  two 
states,  a  normal  state  sn-i  followed  by  an  insertion  state  s a. 
Therefore,  the  corresponding  states  of  the  reference  phone  se¬ 
quence  C are  S  =  si,  S2,  •••,  S2n-  Q  =  91, 92, qr  represents 
the  possible  state  transition  path  taken  in  S.  Q  takes  on  values 
of  phones  in  S  by  a  monotonic  order: 

if  qt  =  stlq,+i  =  Sj,  then  1  <  j.  (I) 

The  probability  being  in  state  x  and  emitting  observation 
vie  at  time  t  is 


1.  Introduction 

While  many  dialect  recognition  systems  take  advantage  of 
phonotactic  differences  across  dialects,  most  of  these  systems 
do  not  focus  on  characterizing  linguistically  interpretable  re¬ 
sults.  Exceptions  include  [2,  10,  4.  5).  In  [2],  acoustic  differ¬ 
ences  caused  by  phonetic  context  were  used  to  infer  underly¬ 
ing  phonetic  rules.  In  [4,  5],  where  discriminative  classiers  are 
trained  to  recognize  dialects,  and  N-grams  or  context-dependent 
phones  helpful  in  dialect  recognition  are  discussed.  This  line  of 
work  has  important  applications  in  forensic  phonetics  [1], 

In  our  previous  work  [10],  we  proposed  a  pronunciation 
model  which  characterizes  phonetic  transformations  across  di¬ 
alects.  We  adopted  standard  triphone  state  clustering  techniques 
used  in  ASR  to  model  context-dependent  phonetic  transforma¬ 
tions  across  dialects.  In  this  work,  we  refine  our  previous  model 
to  characterize  deletion  transformations  more  appropriately.  We 
show  that  deletion  errors  are  reduced  by  33%  compared  to  our 
previous  standard  tying  system  [10], 

2.  Method 

2.1.  Pronunciation  Model 

We  used  an  HMM  system  for  our  previously  proposed  pronun¬ 
ciation  model.  The  reference  dialect’s  pronunciation  is  modeled 
by  the  states,  and  the  pronunciation  of  the  dialect  of  interest  is 
modeled  by  the  observations  emitted  by  the  states.  Phonetic 
transformations  (deletion,  insertion,  and  substitution)  between 
to  two  dialects  are  modeled  by  state  transition  probabilities. 


♦This  work  is  sponsored  by  tire  Command,  Control  and  Interop¬ 
erability  Division  (CIO),  which  is  housed  within  the  Department  of 
Homeland  Security's  Science  and  Technology  Directorate  under  Air 
Force  Contract  FA8721-05-C-0002.  Opinions,  interpretations,  conclu¬ 
sions  and  recommendations  are  those  of  the  authors  and  are  not  neces¬ 
sarily  endorsed  by  the  United  States  Government. 


Bx{k)  =  P(o,  =  vk\qt  =  x),  (2) 

where  1  <  x  <  N,  1  <  k  <  M. 

When  traversing  over  all  the  possible  state  transition  paths 
of  S,  the  probability  of  s.  corresponding  to  state  x  and  emits  t ir¬ 
is 

6,(o.)  =  Bz(k),  (3) 

where  s,  =  x,  1  <  i  <  2 n. 

State  Transitions.  There  are  4  types  of  state  transitions:  in¬ 
sertion,  self-insertion,  deletion,  and  typical  transitions.  State 
transition  types  arc  represented  by  r  e  {ins,sel,del,  typ}. 

The  state  transition  probability  from  state  x  to  state  y 
through  transition  arc  type  r  is 

Arrs,  =  P(qi  +  1  =  y,  r\q,  =  a;),  (4) 

where  1  <  x,y  <  N,  transition  type  r  6  [ins,  sel,  del,  typ], 

XL  =  L  Va1. 

When  traversing  over  all  the  possible  state  transition  paths 
of  .S,  the  probability  of  transitioning  from  state  s,  to  state  Sj  in 
S  through  transition  type  r  is 

0,rj  Airy,  (5) 

where  1  <  i,j  <  2 n,s,  =  x,s}  =  y.  Note  that  if  r  =  sel, 
then  i  =  j. 

2.1.2.  Decision  Tree  Clustering 

In  the  context  of  our  pronunciation  model,  a  decision  tree  is 
grown  for  each  state  s,  where  1  <  s  <  A'.  At  each  node  k 
of  the  tree,  a  list  of  attributes  are  used  to  split  the  data  into  two 
subgroups.  The  attribute  Hr  which  generates  the  best  split  is 
chosen  to  split  the  data  to  children  nodes.  This  splitting  pro¬ 
cessing  is  done  recursively  until  a  stop  criterion  is  reached.  The 


best  splitting  is  determined  by  an  objective  function  such  as  the 
log  likelihood  increase  or  information  gain. 

Assume  Pk(j)  is  the  probability  that  state  s  emits  observa¬ 
tion  Vj  at  node  k,  where  attribute  Hk  specifies  the  subgroups  of 
s  that  belong  to  node  k.  The  likelihood  function  of  state  s  emit¬ 
ting  observation  Vj  at  node  k  is  L(o  =  vj  [.s  G  Ilk)  =  pi.- (j ) • 
The  maximum  likelihood  estimate  of  Pk{j)  >s  simply  the  ob¬ 
served  relative  frequency  of  observation  Vj  at  node  k:  PkU)  — 
,  where  nk(j)  is  the  expected  number  of  times  Vj  occurred 
at  node  k,  and  rtk(j)  =  'U-  The  total  likelihood  at  node  k 
is 


L(Ok\s  e  Hk)  =  n filPk(j)nk<»  (6) 

Suppose  node  k r  and  node  k 2  are  the  children  of  node  k , 
then  the  log  likelihood  increase  of  splitting  node  k  to  node  k\ 
and  k'2  is 


A  logL  =  log 


II,-=i,2L(Ofc,|s€/4i) 
L(Ok\s  e  Hk) 


(7) 


2.1.3.  Standard  Triphone  Tying  Mechanism 

Standard  tying  is  similar  to  how  triphone  states  are  tied  in  au¬ 
tomated  speech  recognition  [6].  Suppose  attribute  H fu  in  the 
decision  tree  model  corresponds  to  the  feature  /  being  present 
(k  —  1)  or  absent  (k  —  2)  of  the  contextual  phones  of  a  triphone 
state. 

The  log  likelihood  of  a  group  of  clustered  triphone  states 
are  computed  using  the  expected  number  of  emissions  of  these 
triphones.  The  expected  number  of  emissions  of  triphone  states 
qi  that  correspond  to  attribute  II fk  emitting  observation  vj  is 

T 

E(Vj\qt  €  H /k)  =  J2  p(°\<I‘  6  fiIk,  A,  S)S(oi,  vj),  (8) 
1=1 

where  S  are  the  states,  and  qt  all  share  the  same  center-phone, 
k  =  {1, 2},  and 


<S(ot,v,)  =  1  if  ot  =  Vj  (9) 

=  0  otherwise  (10) 

The  total  likelihood  of  qt  €  H jk  is 


Limitations;  The  standard  triphone  tying  mechanism 
makes  two  assumptions  for  deletion  rules.  (1)  If  a  phone  is 
deleted,  the  pronunciation  of  its  previous  phone  will  be  affected 
and  characterized  phonetically  through  automatic  phone  recog¬ 
nition  or  manual  phone  transcriptions.  (2)  The  phone  following 
the  deleted  phone  does  not  characterize  when  deletions  occur. 
These  assumptions  might  be  over-simplifications  and  only  ap¬ 
ply  to  certain  deletion  rules.  For  example,  one  difference  be¬ 
tween  General  American  English  (GAE)  and  Received  Pronun¬ 
ciation  (RP)  in  British  English  is  that  the  former  is  rhotic  while 
the  latter  is  not.  Rhotic  speakers  pronounce  /r /  in  all  positions, 
while  non-rhotic  speakers  pronounce  /r/  only  if  it  is  followed 
by  a  vowel  sound  in  the  same  syllable.  For  instance,  the  word 
park  (/p  aa  r  k/)  in  American  English  will  sound  like  pak  ([p  aa: 
k]1)  in  RP,  since  Irl  is  follwed  by  a  consonant  fk/.  Clearly,  this 
non-rhotic  rule  does  not  comply  with  assumption  (2).  While 
the  vowel  before  /r/,  /aa /  does  changes  its  vowel  quality  by  be¬ 
coming  longer  [aa:],  this  phenomenon  might  be  too  sublet  to 
characterize  practically  in  automated  systems,  and  might  not  be 
true  for  all  deletion  transformations  across  dialects. 

In  addition,  since  deletions  are  modeled  by  deletion  tran¬ 
sition  arcs  that  skip  states  (therefore  the  deleted  states  will  not 
emit  anything)  in  our  model,  it  is  more  appropriate  to  use  arc 
clustering  instead  of  traditional  state  clustering  to  determine  the 
tying  structure. 

2.1.4.  Sophisticated  Tying  Mechanism 

A  state  transition  arc  is  specified  by  the  origin  state  and  the  des¬ 
tination  state.  In  the  case  of  deletion  arcs,  the  normal  states 
that  are  skipped  during  the  transition  also  characterizes  the  state 
transition  arc. 

Consider  triphone  state  Sk  -i  —  Sk  —  Sk+\.  Expected  counts 
of  the  state  x  being  deleted  when  qt  corresponds  to  attribute 
11 fk  is 


T 

Ed =»  =  E  E  P(?«+i.  '•  =  dd -  rfl e  Hfk),  (14) 

<  =  1  d= X 

where  d  represents  the  deleted  state. 


T 

%  =  EE%'.*=*^/‘)'  05) 

«  =  1  X 


L(o\q,  e  Hfk)  =  n;=,{ 


St  ,  E(Vj\ qt  G  Hfk)  .Etv^qtkHfk) 


Ej  E{vj\q,  e  Hfk] 


(11) 


since  state  x  cannot  have  deleted  if  there  were  transition  arcs 
leaving  it. 

The  total  likelihood  of  qt  corresponding  to  attribute  Hfk  is 


After  state  clustering,  assume  triphone  states  are  clustered 
into  1  groups.  Group  i  is  specified  by  G,  =  (([,  ,  <jr),  where 

G  specifies  the  left  context  state.  Cm  specifies  the  center  (mid¬ 
dle)  state,  and  G  specifics  the  right  context  state. 

The  models  estimation  equations  still  have  the  same  form 
in  a  typical  HMM  system  [7]: 

A-r  ,  E::if(0,„-.€G..,-,„.»|A..'.)||a 

E<-i  E,  7^(0,  <zi_i  G  G,|A,S) 

EEmn  eG„lA,5)d(0,,uQ 

E E  P(0,qt  G  Gi,  | A,  S) 


L(x\q,  6  Ilfk)  =  (-= 


Ed=z 


Ed=x  “F  Ed^tz 


jt-d  I  ( _ 


Ad^T 


Ed=x  “f  Ed^x 


(16) 


The  log  likelihood  increase  in  decision  tree  clustering  can  thus 
be  computed  to  determine  the  attributes  of  each  group  of  clus¬ 
tered  deletion  arcs.  After  arc  clustering,  assume  deletion  arcs 
are  clustered  into  J  groups.  Group  j  is  specified  by  Dj  = 
(cTj ,  Cj,  Tj ) ,  where  a}  specifies  the  source  of  the  arc,  sj  spec¬ 
ifies  tire  skipped  state,  and  t,  specifies  the  target  of  the  arc.  The 
model  estimation  equation  for  deletion  transitions  belonging  to 
clustered  group  D,  is  similar  to  Eq.  (12): 


1  [aa:]  represents  a  long  [aa] 


Bt:,(k) 


Table  1:  WSJCAMO  data  partition 


A-  J2t= 1  P(P.9t-i  g  Oj,r  =  del,  d  €  <3,  gt  £  Tj|A^ 

Dj  Ef=i  Er^(0,*-i  e^|A,S) 

After  state  clustering,  assume  triphone  states  are  clustered 
into  I  groups.  Group  i  is  specified  by  (7,  =  (0?,  fm,  <r),  where 
C <  specifies  the  left  context  state.  Cm  specifies  the  center  (mid¬ 
dle)  state,  and  Cr  specifies  the  right  context  state. 

After  arc  clustering,  assume  deletion  arcs  are  clustered  into 
J  groups.  Group  j  is  specified  by  Dj  =  (<Xj ,  q,  Tj),  where  Oj 
specifies  the  source  of  the  arc,  ^  specifies  the  skipped  state,  and 
Tj  specifies  the  target  of  the  arc. 

Supposewe  want  to  compute  the  tied  probabilities  of  the  the 
triphone  state  sk-i  -  sk  -  sk+ 1,  where  s*,_i  G  Cr,  Sfc  £  Cm, 
andsfc+i  G  Cr-  We  first  compute  all  the  clustered  deletion  prob¬ 
abilities  originating  from  sk.  Then  we  estimate  the  typical  and 
insertion  transition  probabilities  as  in  the  standard  tying  case 
using  a  new  Lagrange  constraint. 

The  sum  of  all  deletion  probability  leaving  triphone  state 
Sk- 1  —  —  Sfc+i  is 

Pd  =P{qt+i  =  Sk+2,r  —  del\qt  —  sk)  (18) 

=  P{qt+i  =  Sfc+2  £  Tj,r  =  del\qt  =  sk,  sk+ 1  £  Cj) 

(19) 

P(sk+ 1  G  Q,gt+ 1  =  sk+2,r  =  del\qt  =  sk )  (20) 

where  P(qt+ 1  —  sjt+2  G  Tj,r  =  del\qt  sk,  S;+i  €  C,)  was 
already  computed  in  Eq.  (17)  as  ADj  . 


Aaif  r 


Z^PjO.gt-ieG^rl^S) 

ELiEr-P(0,?t-i  eG.n^S) 


(1  -  ^5) 


where  £  =  {01  J  cr2  U  ...07},  and  £c  -  {erf  Cl  <x2  fl  ...of}, 
r  G  {typ,ins}. 


2.2.  Statistical  Test 

We  used  the  matched  pairs  test  in  [9]  to  evaluate  whether  the 
performance  difference  of  the  two  systems  being  compared  are 
statistically  significant. 

Let  us  suppose  that  we  can  divide  the  output  stream  from 
a  pronunciation  model  system  into  segments  in  such  a  way  that 
the  errors  in  one  segment  are  statistically  independent  of  the 
errors  in  any  other  segment.  Suppose  we  are  comparing  the 
performance  difference  of  Si  and  S2  Let  Ar{  be  the  number  of 
errors  made  on  the  i  i-th  segment  by  System  Si  ,  and  Ar2  the 
number  of  errors  made  by  System  S2.  Note  that  the  type  of 
error  is  unimportant,  as  long  as  the  method  of  counting  errors  is 
consistent  for  each  segment  and  for  both  systems. 

Let  Zi  —  N{  —  A/j,  i  =  1,  ...,n,  where  n  is  the  number 
of  segments.  Let  fi,  be  the  unknown  average  difference  in  the 
number  of  errors  in  a  segment  made  by  the  two  Systems.  We 
would  like  to  ascertain  whether  =  0.  The  maximum  likeli¬ 
hood  estimate  of  fiz  and  the  variance  of  Z,  are 


-  fa 


112)2 


(22) 

(23) 


Set 

Speaker  number 

Duration 

Train 

92 

15.3  hr 

Dev 

48 

4  hr 

Test 

48 

4  hr 

If  W  is  defined  as 


W  = 


(24) 


then  if  n  is  large  enough,  W  will  approximate  a  standard 
normal  distribution  A1{0, 1).  We  can  test  the  null  hypothesis 
HO:  pz  =  0,  bu  computing  P  =  2 Pr(Z  >  |u>i),  where  Z  is  a 
random  variable  with  distribution  N(0, 1)  and  w  is  the  realized 
value  of  W. 


3.  Experiments 

3.1.  Assumptions  and  Protocol 

We  adapt  the  assumptions  in  [10]  to  the  following. 

1 .  All  pronunciation  variations  across  dialects  are  governed 
by  underlying  phonetic  rules. 

2.  The  ground-truth  surface  phones  of  the  WSJCAMO  cor¬ 
pus  are  the  phonetic  transcriptions  it  provides. 

3.  The  ability  to  predict  ground-truth  surface  phones  using 
the  trained  pronunciation  models  indicates  how  well  the 
underlying  phonetic  rules  are  retrieved  from  the  pronun¬ 
ciation  model  algorithms. 

3.2.  Data 

The  speech  database  used  is  WSJ-CAM0  is  the  UK  English 
equivalent  of  a  subset  of  the  US  American  English  WSJ0 
database  [11],  The  data  partition  of  WSJ-CAM0  is  listed  in 

JS'eimplementation  Details 

3.3.1.  Pronunciation  Model 

Given  the  trained  pronunciation  model,  we  generate  the  most 
likely  observations  given  the  reference  phones,  and  compare 
the  generated  observations  with  the  ground-truth  observations, 
provided  by  the  phone  transcriptions  in  WSJCAMO.  Refer¬ 
ence  phones  are  determined  by  the  American  English  dictionary 
given  the  text. 

3.3.2.  Statistical  Test 

We  could  divided  the  generated  surface  phone  outputs  into  seg¬ 
ments  where  no  errors  have  occurred  for  some  minimal  time 
period  T  (“good”  segments)  and  segments  where  errors  occur 
(“bad”  segments),  according  to  [9].  T  is  required  to  be  suffi¬ 
ciently  long  to  ensure  that  after  a  good  segment,  the  rst  error 
in  a  bad  segment  is  independent  of  any  previous  errors.  T  was 
swept  on  the  development  set  (ranging  from  values  of  9  to  402), 
and  all  resulted  in  similar  p-values  (p  <£  0.001)  on  the  test  set. 
The  number  of  segments  n  ranged  from  756  to  32491,  which  is 
assumed  to  be  sufficiently  large  enough  for  W  to  be  normally 
distributed,  and  a  good  estimate  of  the  variance  of  Zi  can  be 
obtained.  Errors  were  divided  into  deletion,  insertion,  and  sub¬ 
stitution,  and  each  type  of  error  was  analyzed  separately. 

3.4.  Phone  Error  Rate  Results 

The  phone  error  rate  (PER)  between  the  ground-truth  surface 
phones  and  the  generated  surface  phone  of  each  system  are 


Table  2:  Phone  error  rate  (PER)  for  each  system.  Units  are  in 
%.  Total  number  of  phones  in  the  test  set:  299,853. 
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9.0 

2.1 

1.9 

5.0 

Sophisticated  Tying 

9.0 

1.4 

2.6 

5.0 
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terms  of  generating  dialect-specific  pronunciations,  standard 


Table  3:  Relative  PER  improvement  (compared  to  baseline 
Monophone  System).  Units  arc  in  %. 


ani  sophisticated  tying  might  not  show  statistical  difference, 
but  arc  clustering  is  much  more  suitable  in  discovering  and  in- 
-terpreting  deletion  rules.  The  arc  clustering  scheme  explicitly 
characterize  deletion  rules:  the  decision  tree  clustering  results 
show  potential  deletion  rule  candidates.  On  the  other  hand,  it 
is  much  more  challenging  to  linguistically  characterize  dele¬ 
tion  transformation  as  phonetic  rules  in  state  clustering.  There¬ 
fore,  depending  on  the  need  of  the  task,  different  tying  schemes 
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_ 49  rel 
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tion  rules. 

4.3.  Implications  for  Dialect  Recognition 


listed  in  Table  2.  The  sub-error  categories  of  deletion,  inser¬ 
tion,  and  substitution  are  also  listed  in  Table  2.  The  relative  im¬ 
provement  of  all  systems  compared  to  the  baseline  monophone 
system  is  listed  in  Table  3.  All  improvements  are  shown  to  be 
statistically  significant  (p  -C  0.0001)  according  to  the  matched 
pairs  test  described  in  Section  2.2. 

4.  Discussion 

4.1.  Context-Dependent  Systems  vs.  Monophone  System 

All  systems  that  exploits  context  information  outperformed  the 
baseline  monophone  system  by  40  %  relative  (p  <C  0.001.). 
These  results  verify  that  phonetic  context  information  is  impor¬ 
tant  in  characterizing  dialect  differences,  as  reported  in  [10,  ?]. 

If  we  break  down  the  PER  into  sub-error  categories  of  in¬ 
sertion,  deletion,  and  substitution,  we  see  statistically  signifi¬ 
cant  improvements  for  all  categories  in  all  systems,  except  for 
deletion  errors  for  the  standard  tying  system.  Compared  to  the 
baseline  monophone  system,  the  standard  tying  systems  show 
statically  significant  negative  improvement  in  deletion  errors  (- 
5  %;  p  <S  0.0001).  This  result  imply  that  the  standard  tying 
systems  are  over-generalizing  deletion  rules. 

Note  that  in  the  standard  tying  systems,  the  phone  follow¬ 
ing  the  deleted  phone  is  never  used  to  characterized  the  deletion 
transitions.  In  the  monophone  system,  deletion  ndes  are  char¬ 
acterized  by  the  phone  preceding  the  deleted  phone.  Phono- 
logically  speaking,  the  phone  of  interest  is  generally  influenced 
more  by  its  following  phone  than  its  preceding  phone.  In  non- 
rhotic  dialects  of  English,  we  also  know  that  the  right-context 
of /r /  is  more  important  in  specifying  non-rhoticity  than  the  left. 
Therefore,  without  characterizing  deletion  transitions  using  the 
right-context  phone,  it  is  expected  that  deletion  rules  are  over¬ 
generalized.  We  expect  that  including  the  phone  following  the 
deleted  phone  could  characterize  deletion  transitions  more  ac¬ 
curately.  We  discuss  these  details  in  the  next  section. 

4.2.  Standard  Tying  vs.  Sophisticated  Tying 

4.2.1.  deletion  errors 

The  overall  PER  between  the  standard  and  sophisticated  tying 
systems  are  the  same.  The  matched  pairs  test  shows  that  the  two 
systems  ae  making  statistically  different  errors  (p  •C  0.0001). 
If  we  consider  deletion  errors,  we  see  that  the  sophisticated  ty¬ 
ing  system  beats  the  standard  tying  system  by  33%  relative. 
Among  the  /r/’s  that  were  incorrectly  deleted  in  the  standard  ty¬ 
ing  system,  the  sophisticated  tying  system  correctly  generated 
24%  of  these  /r/’s.  This  result  also  supports  the  hypothesis  that 


The  statistical  test  evaluates  whether  two  systems  make  the 
unique  errors.  The  signicant  statistical  test  results  indicate  that 
the  two  systems  being  compared  makes  different  errors,  imply¬ 
ing  that  if  the  pronunciation  models  arc  used  in  dialect  recogni¬ 
tion  tasks,  they  will  fuse  well. 

5.  Conclusions 

We  propose  a  sophisticated  tying  mechanism  for  modeling  dele¬ 
tion  transformations  between  dialects.  We  empirically  show 
that  the  proposed  tying  mechanism  reduces  deletion  errors  by 
33%  when  compared  to  a  baseline  system  using  a  standard  ty¬ 
ing  mechanism.  Statistical  tests  show  that  the  proposed  and 
baseline  models  make  statistically  different  errors,  thus  sug¬ 
gesting  that  they  are  complementary  systems  in  dialect  recogni¬ 
tion  tasks.  Pronunciation  rules  learned  by  our  proposed  system 
quantify  the  occurrence  frequency  of  known  rules,  and  suggest 
rule  candidates  for  further  linguistic  studies.  Potential  appli¬ 
cations  include  forensic  phonetics,  accent  training,  and  dialect 
recognition. 
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